Runtime Compression of MPI Messages to Improve the Performance and Scalability of Parallel Applications
نویسندگان
چکیده
Communication-intensive parallel applications spend a significant amount of their total execution time exchanging data between processes, which leads to poor performance in many cases. In this paper, we investigate message compression in the context of large-scale parallel message-passing systems to reduce the communication time of individual messages and to improve the bandwidth of the overall system. We implement and evaluate the cMPI message-passing library, which quickly compresses messages on-the-fly with a low enough overhead that a net execution time reduction can be obtained. Our results on six large-scale benchmark applications show that execution speed improves by up to 98% when message compression is enabled.
منابع مشابه
Topic 13: High Performance Network and Communication
This topic on High-Performance Network and Communication is devoted to communication issues in scalable compute and storage systems, such as parallel computers, networks of workstations, and clusters. All aspects of communication in modern systems were solicited, including advances in the design, implementation, and evaluation of interconnection networks, network interfaces, system and storage ...
متن کاملPerformance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System
Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, con...
متن کاملReducing Communication Time through Message Prefetching
The latency of large messages often leads to poor performance of parallel applications. In this paper, we investigate a novel latency reduction technique where message receivers prefetch messages from senders before the matching sends are called. When the send is finally called, only the parts of the message that have changed since the prefetch need to be transmitted, resulting in a smaller mes...
متن کاملConstructing Resiliant Communication Infrastructure for Runtime Environments
Next generation HPC platforms are expected to feature millions of cores distributed over hundreds of thousands of nodes, leading to scalability and fault-tolerance issues for both applications and runtime environments dedicated to run on such machines. Most parallel applications are developed using a communication API such as MPI, implemented in a library that runs on top of a dedicated runtime...
متن کاملPerformance of Multicore Systems on Parallel Datamining Services
Multicore systems are of growing importance and 64128 cores can be expected in a few years. We expect datamining to be an important application class of general importance and are developing such scalable parallel algorithms for managed code (C#) on Windows. We present a performance analysis that compares MPI and a new messaging runtime library CCR (Concurrency and Coordination Runtime) with Wi...
متن کامل